Dimension-based Quality Modeling of Transmitted Speech by Marcel Wältermann

Dimension-based Quality Modeling of Transmitted Speech by Marcel Wältermann

Author:Marcel Wältermann
Language: eng
Format: epub
Publisher: Springer Berlin Heidelberg, Berlin, Heidelberg


3.6.2.3 Discussion

The detailed analysis of the NB and WB perceptual spaces revealed that in both cases, three common orthogonal features are dominant: “Discontinuity”, “coloration”, and “noisiness”. In physical terms, these dimensions might relate to degradations of the speech signal in the time and frequency domain, whereas noise is considered separately. In the WB scenario, an additional WB-specific dimension is present: “HF Distortion”. However, this dimension is of least importance in terms of the variance it covers.

In principle, similar dimensions have been found in the related literature presented in Sect. 3.3, however, in different contexts. As an example, Mattila (2001) found five dimensions for clean speech signals in the context of mobile communications, two of them, “noisy” and “dark–bright”, can directly be associated with “noisiness” and “coloration”, respectively. “Smooth–fluctuating–interrupted” and “bubbling” are merged in “discontinuity” here, reflecting both interruptions due to packet loss and time-variant coding effects, for example. In Mattila (2001), however, a further dimension “‘Synthetic–natural” was found.

The dimensions that were found here capture effects that are caused by a wide variety of distortions (cf. condition list in Sect. 3.4). Although many conditions can intuitively be assigned to the dimensions (e.g., packet loss to “discontinuity”, or bandpass-filtering to “coloration”), the analyses revealed perceptual effects not apparent at first sight, yet well understandable at second glance.

The coded speech conditions stimulate all dimensions, with the degrees depending on the coding technique: For example, besides a certain coloration that codecs provoke in both the NB and WB scenario, the G.726 is related to “noisiness” due to the signal-correlated noise it introduces. Low-bitrate codecs like the AMR-WB are related to the “discontinuity” dimension (cf. “bubbling” in Mattila 2001). Other studies, considering the multidimensional nature of codecs only, confirm the finding that codecs are loading on the dimensions “noisiness” and “coloration”, since similar dimensions could be found: “noisiness”, “low-frequency content” (Hall 2001), and “Color of Sound”, “Noise” (MDS in Bappert and Blauert 1994), see Sect. 3.3. In both investigations, however, the most important dimension represents “naturalness”, mainly reflecting the overall quality, or evaluation (cf. discussion in Sect. 3.3). The “discontinuity” dimension might be hidden there.

Whereas in the NB case codecs are perceived as rather similar, they are more wide spread in the WB case. Here, the codec distortions are obviously easier perceivable, owing to the wider bandwidth. However, it seems that the SD technique’s discrimination power is higher for subtle effects like coding distortions as it is the case in pairwise comparing the similarity, although the differential sensitivity of the human hearing system is commonly higher than its absolute sensitivity, see Sect. 2.4.3.2 and Möller (2000, p. 49). Perhaps, the diverse set of pre-defined attributes helps the participants in identifying the features. The MDS solution seems to emphasize more severe effects and appears to be less sensitive to subtle differences.

Spectral effects stemming from the usage of hands-free terminals in reverberant rooms at send side lead to a strong “coloration” in the NB scenario. In contrast, the resulting perceptual effects are less prominent in the WB case.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.